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An algorithm for sampling exactly from the normal distribution is given. The algorithm reads some number of 
uniformly distributed random digits in a given base and generates an initial portion of the representation of a 
normal deviate in the same base. Thereafter, uniform random digits are copied directly into the representation 
of the normal deviate. Thus, in constrast to existing methods, it is possible to generate normal deviates exactly 
rounded to any precision with a mean cost that scales linearly in the precision. The method performs no arbitrary 
precision arithmetic, calls no transcendental functions, and, indeed, uses no floating point arithmetic whatsoever; 
it uses only simple integer operations. The algorithm is inspired by von Neumann's algorithm for sampling from 
the exponential distribution; an improvement to von Neumann's algorithm is also given. 
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1. INTRODUCTION 



Random variables with a normal distribution, 
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are widely used in Monte Carlo simulations. Over the past 
sixty years, scores of algorith ms for generating suc h normal 
deviates have been published dThomas et al. , 2007). In this 
paper, I give another algorithm with the distinguishing fea- 
ture that, given a source of uniformly distributed random dig- 
its in some base 6, it generates exact normal deviates. In or- 
der to make the meaning of "exact" precise, consider Table Q] 
which illustrates the operation of the algorithm using b = 10 
(with the implementation given in the appendix). The first col- 
umn shows the input to the algorithm, which, in this example, 
is taken from s uccessive lines of the table of random digits 
produced by the [RAND C orporation! (1 19551) . beginning at line 
13677 (on p. 274); the resulting normal deviates are given in 
the second column. Referring to the first line of this table, 
the algorithm reads random decimal digits, 47152 7, from the 
source and produces +0.2 as the initial portion of the deci- 
mal representation of a normal deviate. Thereafter, random 
digits can be copied directly from the input to the output, in- 
dicated by the ellipses (. . .) in the table; thus +0.2 . . . repre- 
sents a uniform random sample in the range (0.2,0.3). The 
next digits of the random sequence are 7172 6817 . . .; thus the 
normal deviate can be exactly rounded to 8 decimal digits as 
0.27172 682. Alternatively, a normal deviate can be exactly 
compared against a constant 0.5, for example, by adding at 
most one additional digit to the results in the table. (The re- 
sults in Table Q] are not "typical," because the starting line in 
the table of random digits was specifically chosen to limit the 
number of random digits used.) 

It's clear from this example that rounding exactly to an 
IEEE double precision number is straightforward (particu- 



TABLE 1 Sample input and output for Algorithm N with b = 10. 
The input consists of uniformly distributed random decimal digits 
and the output gives the resulting normal deviates. 



input 



output 
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47152 7... +0.2... 

70314 19047 5 . . . +0. . . . 

7413163178 62857... +2.2... 

20147 9673... -1.6... 

61921 13332 0... -1.132. 



larly if b is a power of two); i.e., the method can be used 
to generate de viates that satisf y the conditions of "ideal ap- 
proximation" (Monahan, 1985), namely that the algorithm is 
equivalent to sampling a real number from the normal dis- 
tribution and rounding it to the nearest representable floating 
point number. Other sampling methods are fr equently referred 
to as "exact," for example t he polar method ([B ox and Muller, 
119581) and the ratio method (Kinder man and Monahanl ll977); 
but these are merely "accurate to round off" which, in prac- 
tice, means only that the accuracy is commensurate with the 
precision of the floating point number system. Furthermore, 
for applications requiring high precision normal deviates, the 
new algorithm offers ideal scaling. There's an amortized con- 
stant cost to producing the initial portion of the normal de- 
viate; but, thereafter, the digits can be added to the result at 
a rate limited only by the cost of producing and copying the 
random digits. 

It's not immediately obvious that such an algorithm for ex- 
act sampling is possible. However, in the early years of the 
era of modern computing, Ivon Neumann! (1 1 95 lb presented a 
remarkably simple algorithm for sampling from the exponen- 
tial distribution. 

Algorithm V (von Neumann). Samples E from the exponen- 
tial distribution e~ x for x > 0. 

VI. [Initialize rejection count.] Set I <— 0. 

V2. [Sample fraction.] Set x <— U, where U is a uniform 
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deviate U € (0,1). 

V3. [Generate a run.] Sample uniform deviates Ux, U2, ■ ■ ■ 
and determine the maximum value n > such that 
x > Ui > U 2 > ■ ■ ■ > U n . 

V4. [Test length of run.] If n is even, set E I + x and 
return; otherwise set I 4— I + 1 and go to step V2. I 

(Because the algorithm generates continuous random devi- 
ates, there's no distinction between the inequalities x > 
and x > or the intervals (0, 1) and [0, 1].) According to 
von Neumann, this algorithm was suggested by the game of 
Black Jack and this connectio n is made plain in the slightly 
different formulation given in Abra mowitz and Stegunl(ll964[ 
§26.8.6.c(2)). We shall see in Sec. [2] that by treating a; as a 
sequence of random bits, Algorithm V can be easily adapted 
to sample exponential deviates exactly. 

Sev eral authors have generalized yon Neumann's algo- 
rithm ( Ahrens and Diete rl ll973l iBrentl 1 19741 iForsvtiiel Il972t 
lMonahanlll979h . However, these efforts entail using ordinary 
floating point arithmetic and thus the methods do not generate 
exact deviates. In this paper, I show that the algorithm can 
be extended to sample exactly from the unit normal distribu- 
tion. Although the resulting algorithm is unlikely to displace 
existing methods for most applications, the algorithm will be 
useful wherever exactness is required and when using arbi- 
trary precision arithmetic. It is also of theoretical interest as 
an example of an algorithm where exact transcendental results 
can be achieved with simple integer arithmetic. 

2. VON NEUMANN'S ALGORITHM 

I start with a brief proof of von Neumann's algorithm and 
an improvement to it. The crucial step is step V3. The prob- 
ability that Ux, ■ ■ ■ , U n are all less than x is x n (provided that 
x E [0, 1]). The probability that, in addition, they are in de- 
scending order (one of the possible n! permutations) is x n /n\. 
For the condition to hold for a sequence of n + 1 numbers, 
it must hold for the first n of them; therefore the probabil- 
ity that the length of the longest decreasing sequence is n is 
x n /n\ — x n+1 1 (n + 1)1. For a given x, the probability that n 
is even is 



while the probability that n is odd (averaged over x) is 1 — 
J e~ x dx = er 1 . Thus the probability that the algorithm 
terminates with a particular value of I and x is exp(— (I + x)) 
as required. 

On average, this algorithm requires e 2 /(e — 1) « 4.30 uni- 
form deviates; in effect, the algorithm sums all the terms in the 
Taylor series for e~ x in a finite mean time. Conventionally, U 
would be sampled from the subset of reals which are repre- 
sentable as double precision floating point numbers; in this 
case, the results would be only approximately equivalent to 
sampling exactly from the exponential distribution and round- 
ing the results to the closest floating point number. However, 
because the only operation performed on the uniform deviates 



is a comparison, the uniform deviate can be generated one bit 
at a time. Typically, the comparisons can be decided by gen- 
erating only a few bits of each operand. At the completion of 
the algorithm, I and some initial set of the bits in the fraction 
of x are known. The rest of x can be filled in with uniform 
random bits. In this way, the algorithm generates exponen- 
tial deviates which are exact in the sense discussed in con- 
ne ction with Table [pin S ec.Q] This was the insight provided 
bv lKnuth and Yaol d 19761) and the m ethod was extensively an- 
alyzed by Flaiol et and Sahebl (Il986h . They showed that the 
mean number of bits needed to generate m bits of the result 
(coded in Knuth and Yao's unary-binary notation) is to + 7 
where 7 « 5.680 is the "balance" of the algorithm. Here I 
generalize this methodology by representing the uniform de- 
viates in some base b with digits uniformly distributed in [0, b) 
being added as needed. Interesting values of b are typically 2 
(adding a bit at a time) and 2 32 (adding a random "word" at a 
time). 

It's possible to make von Neumann's algorithm slightly 
more efficient by using early rejection. 

Algorithm E (improved von Neumann). Improved algorithm 

for sampling E from e~ x for x > 0. 

El. [Initialize rejection count.] Set I <— 0. 

E2. [Sample fraction.] Set x <s— U, where U is a uniform 

deviate U € (0,1). 
E3. [Early rejection.] If x > |, set I <— I + 1 and go to step 

E2. 

E4. von Neumann's step V3. 

E5. [Test length of run.] If n is even, set E <— t-I + x and 
return; otherwise set I •<— I + 1 and go to step E2. I 

The early rejection step results in lowering the mean number 
of uniform deviates required to e/( v / e — 1) s» 4.19. The 
balance is reduced to about 3.906. 

Von Neumann's algorithm can be adapted to generate a 
Bernoulli trial with probability 1 / y/e, as follows. 

Algorithm H (a half- exponential Bernoulli trial). A 
Bernoulli trial H which is true with probability 1 / yfe. 
HI. [Generate a run.] Sample uniform deviates Ux, U2, ■ ■ ■ 
and determine the maximum value n > such that \ > 

r, -r, ... u n . 

H2. [Test length of run.] Set H A— (nis even). I 

On average, the algorithm uses \e/{y/e — 1) (resp. ^e) uni- 
form deviates if the result is false (resp. true); the overall 
weighted average is y/e. If b = 2, the test uses about 3.786 
(resp. 2.236) bits on average if the result is false (resp. true), 
i.e., 2.846 bits overall. With probability |, this algorithm ex- 
its after sampling just one bit (i.e., Ux > i and n = 0); this 
accounts for the relative efficiency of Algorithm E compared 
to Algorithm V. Algorithm H will be used for sampling from 
the normal distribution. 



3. SAMPLING FROM THE NORMAL DISTRIBUTION 

Here I tackle the problem of sampling normal deviates us- 
ing only comparisons of uniform deviates and simple integer 
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arithmetic, so that exact samples can be obtained. First of 
all, it is clear that determining the integer part of the expo- 
nential deviate in von Neumann's algorithm from the num- 
ber of rejections depends on the property of exponentials, 
cxp(— (k + x)) = exp(— k) exp(— x), which does not hold 
for other distributions. This suggests that the sign and integer 
part of the normal deviate be separately sampled which leads 
to the following skeleton of an algorithm. 

Algorithm N (normal sampling). Sample N from a unit nor- 
mal distribution <p(x) using a rejection method. 

Nl. [Sample integer part of deviate k.] Select integer k > 
with probability exp(— \k)(l — 1/^/e). 

N2. [Adjust relative probability of k by rejection.] Accept k 
with probability exp(— ^k(k — 1)); otherwise go to step 
Nl. 

N3. [Sample fractional part of deviate x.] Set x <— U, where 
U is a uniform deviate U E (0, 1). 

N4. [Adjust relative probability of x by rejection.] Accept x 
with probability exp(— ^x(2k + x)); otherwise go to step 
Nl. 

N5. [Combine integer and fraction.] Set x <— k + x. 

N6. [Assign a sign.] With probability |, set x < x. 

N7. [Return result.] Set N <- x. I 

The analysis of this algorithm is simple. After step N2, the 
relative probability of k is exp(— ifc) x exp(— Afc(fc — 1)) = 
exp(— ^fc 2 ) for k > 0; after step N4, the relative prob- 
ability of [k,x] is exp(— ifc 2 ) x exp(— ^x(2k + x)) = 
exp(— i(fc + x) 2 ) for k > and x E (0,1). From this, 
it follows that the returned value of x has a Gaussian distri- 
bution, (j>(x). Step N2 always succeeds for k = and 1, 
the two most common cases. Overall, the probability that 
step N2 succeeds is (1 — l/ v / e)G « 0.690 where G = 
2~^^ exp(— ifc 2 ) w 1.75. Similarly, step N4 succeeds with 
probability y/n/2/G ~ 0.715. Thus, step Nl is executed 
^2/n/ (1 — 1/ yfe) sa 2.03 times on average. 

Steps Nl and N2 can be expressed in terms of half- 
exponential Bernoulli trials H with 

Steps Nl and N2 in terms of H. 

Nl. [Test H until failure.] Generate a sequence of Bernoulli 
deviates Hi , H2 , ■ ■ ■ and determine the largest k > such 
that Hi , H2 , ■ • ■ , Hk are all true. 

N2. [Make fc(fc-l) tests of H.] Set k' <- fc(fc-l) and gener- 
ate up to k' Bernoulli deviates Hi, H2, ■ ■ ■ , H^. Accept 
k if Hi is true for all i E [1, k']; otherwise go to step Nl. 

Steps N3, N5, and N6 are simple to implement (for details, 
see Sec.|4]l. This just leaves step N4 which is changed to 

Rewriting step N4. 

N4. [Break N4 into k + 1 steps.] Perform up to k + 1 
Bernoulli trials, Bi, B2, ■ ■ ■ , Bk+i, each with probabil- 
ity exp(— x(2k + x)/(2k + 2)). Accept x if Bi is true 
for alH E [1, k + 1]; otherwise go to step Nl. 



This transformation of N4 is motivated by the requirement in 
the proof of von Neumann's method that x E [0,1]. Repeating 
the trial k + 1 times means that the argument to the exponen- 
tial in the original step N4 is divided by k + 1; note that the 
maximum value of x(2k + x)/(2k + 2) (as x is varied) is 
(2fc + l)/(2fc + 2) < 1. 

In order to carry out a Bernoulli trial B, I generalize von 
Neumann's procedure. 

Algorithm B (generalizing von Neumann's step V3). A 
Bernoulli trial with probability cxp(— x(2k + x)/(2k + 2)). 
Sample two sets of uniform deviates Ui,U2,--- and Vi, 
V2, ■ ■ ■ and determine the maximum value n > such that 

x > Ui > U 2 > ■ ■ . > U n and V t < (2k + x)/(2k + 2) for 
all i E [l,n]. 

BI. [Initialize loop.] Setp x, n 0. 

B2. [Generate and test next samples.] 

(i) Sample q U; go to step B4, unless q < p. 

(ii) Sample r ^— U; go to step B4, unless r < 

(2fc + ir)/(2fc + 2). 

B3. [Increment loop counter and repeat.] Set p 4— q, n -s— 

n + 1; go to step B2. 
B4. [Test length of runs.] Set B 4— (n is even). I 

Without step B2(ii), steps BI to B3 are just step V3 of von 
Neumann's algorithm. Because of the additional test B2(ii), 
the probability that the nth trip through the loop succeeds is 
x n /n\ x ((2fc + x)/(2k + 2))". The requirement that n be 
even means that B succeeds with probability 

2k + x x 2 (2k + x\ 2 x 3 (2k + x\ 3 
l ~ X 2k + 2 + Y.\2k + 2J ~ 3!V 2fc + 2 / + "' 

/ 2k + x\ 
= eXP {- X 2kT2)- 

In order to avoid performing arithmetic on uniform deviates 
in step B2(ii), we note that as x varies in (0, 1) the right side 
of the inequality varies from 2k /(2k + 2) to (2k + I) /(2k + 
2). Thus, regardless of the values of x and r, the test will 
succeed with probability 2k/ (2k + 2) and fail with probability 
1 / (2k + 2). The remaining probability, 1 / (2k + 2), is divided 
between success and failure according to r < x. Thus the test 
r < (2k + x)/ (2k + 2) can be replaced with 

Algorithm T (the test in B2(ii)). Perform test T = (r < 
(2k + x)/ (2k + 2)) without doing arithmetic on real numbers. 

Tl. [Sample a selector /.] Set / <- C(2k + 2) where C(m) 
is —1 with probability 1 — 2/m, with probability 1/ra, 
and 1 with probability 1/m. 

T2. [Act on the value of /.] If / < 0, set T <- false; else 
if / > 0, set T <- true; otherwise (/ = 0), set T <- 
(r < x). I 

Finally, C (to) can be computed with 

Algorithm C (the 3-way selector). The choice C(to), 
(—1,0,1) with probabilities (1/to, 1/to, 1 — 2/m), imple- 
mented as the test w < n/ra where it; is a uniform deviate 
in (0,1) and n = m — 2 or n = to — 1. For each successive 
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digit d of w, substitute w = (d + w')/b so that the test be- 
comes w' < n'/m, where n' = bn — dm, and exit as soon as 
the n' is outside the range (0, m). 

CI. [Set the numerators of the fractions.] Set ni +- m — 2 
and n% <— m — 1. 

C2. [Sample the next digit of w, d.] Sample d <— D where D 
is a uniformly distributed integer in [0, b). 

C3. [Multiply inequalities by bm.] Set n\ <— bn\ — dm and 

n, 2 <— bn 2 — dm. 
C4. [Test the new numerators.] If n x > m, set C(m) <— 1 

and return; else if n 2 < 0, set C (m) < 1 and return; 

else if ni < and n 2 > m, set C(m) <— and return; 

otherwise, go to step C2. I 

Step B2 can now be written as 

Step B2 incorporating algorithm T. 

B2. [Generate and test next samples.] 

(a) Sample q <— U; go to step B4, unless q < p. 

(b) Set / <- C(2fc + 2); if / < 0, go to step B4. 

(c) If / = 0, sample r <— [7 and go to step B4, unless 

r < x. 



4. IMPLEMENTATION 

The appendix includes an implementation of Algorithm N 
in C++ together with a simple test program and its output. 
The class unit_normal_dist generates the normal de- 
viates closely following the description in Sec. |3j it uses a 
class rand_digit which produces pseudo random digits in 
a base b and a class rand_number which holds a random 
deviate. The latter class is key to generating exact deviates; 
a random deviate is represented as [s, n, L, do, d\, . . . , dz,-i] 
which corresponds to the uniform deviate 

L-l 

x = s(n + J2 W*' 1 + b ~ Lu ) > 

2=0 

where s = ±1, n and L are non-negative integers, and 
di E [0,b). Here, as above, U represents a uniform deviate 
U G (0, 1). Implicit in this representation is a mechanism 
by which L may be increased and additional digits di can be 
generated. In this implementation, operators which manipu- 
late a rand_number and may need to generate additional 
bits are passed a rand_digit for this purpose. The opera- 
tion x <— U corresponds to s <— 1, n <— 0, and L <— and 
is performed by init. Setting the integer part of a deviate 
(step N5) and switching its sign (step N6) are performed by 
set_integer and negate. Comparing two random devi- 
ates x < t (needed in Algorithms H and B) is performed by 
less_than. If the sign and integer parts of x and t match, 
then successive digits of the fractions are compared until the 
digits are unequal, increasing L for x or t as necessary. Fi- 
nally less_than_half tests x < | (in Algorithm H and 
in determining the sign in step N6); if b is even this reduces to 
do < b/2 and the generalization to all b is straightforward. 



Because comparisons may cause digits to be generated, 
care must be taken to avoid copying random deviates. For 
this reason, the assignment p +- x in step B 1 is incorporated 
into step B2. The order of the comparisons in step B2, at the 
end of Sec. [3] is somewhat arbitrary; in this implementation, 
the number of random digits needed is reduced by reversing 
the order of B2(a) and B2(b) whenever k = 0. The normal 
deviates produced by unit_normal_dist are instances of 
rand_number with enough digits added to the fraction to al- 
low Algorithm N to complete. Digits can be added to the nor- 
mal deviate as illustrated in the test program. Table [T] shows 
sample normal deviates produced by this implementation with 
rand_digit modified to use the tabulated random digits. 

In examining the properties of Algorithm N, let us start 
with the case 6 = 2. Measuring the number of bits used 
on each invocation of the algorithm, I find that the frequency 
decays exponentially with a e-folding constant of about 29.9 
bits. (This exponential behavior is expected given the nature 
of the algorithm.) Thus the mean running time is finite — 
in fact, this implementation uses about 30.000 random dig- 
its on average and the average number of bits in the frac- 
tion is (L) = 1.556. From these two quantities we can 
obtain the balance of the algorithm. For this purpose, the 
normal deviate is represented in a unary-binary format us- 
ing n + L + 2 bits (this includes 1 bit for the sign and 1 
bit to separate the integer and fraction). For Gaussians, the 
mean value of n is P = (LMJ) ~ 0.365, and the balance is 
30.000- (P+ 1.556 + 2) w 28.079. Another useful metric is 
the number of bits needed to generate a correctly rounded re- 
sult in a floating point representation with p bits in the floating 
point fraction. The mean floating point exponent of a normal 
deviate is Q = ([log 2 + 1 ~ —0.416, and the average 
number of random bits needed to produce a rounded floating 
point result is then 30.000 -(Q + 1.556) +p+l « 29.861+p. 
(The addition of 1 on the left side of this expression accounts 
for the extra bit needed for rounding the result.) For IEEE 
double precision floating point numbers, we have p = 53, 
and, on average, 82.861 random bits are required per rounded 
normal deviate. 

Algorithm N effectively decomposes 4>{x) into a set of uni- 
form distributions which are shown in Fig. Q] for the case 
6 = 2. The right half of this figure, x > 0, shows the de- 
composition given by the implementation in the appendix. 
The left side, x < 0, shows the decomposition given by 
the slightly different implementation of step B2 provi ded by 
the E xactNormal class in RandomLib version 1.6 dKarnevi 
120121) . In this version, the comparisons are reordered for 
n = in order to minimize the number of bits added to x. 
This reduces (L) to 1.187; however, the average number of 
random bits used increases to 30.104 and thus, overall, this 
variant is more expensive by about 0.47 bits. Unless the spe- 
cific goal is to minimize the number of bits in the result, the 
method given in the appendix is preferred. 

Turning now to the opposite limit, consider the case b = 
2 32 . Table [2] shows some comparative timings for produc- 
ing normal deviates with a precision of p bits in either the 
IEEE floating point f ormat or the floating point representa- 
tion of IMPFrI d201 ll) . a library for arbitrary precision arith- 



FIG. 1 Algorithm N's decomposition of the normal distribution into a set of uniform distributions with b = 2. For example, the frequency 
with which +0.0 ... is returned is equal to the relative area of the rectangle spanning x € (0, ±) which is ± ^flfH « 5%. The left and right 
portions of this figure correspond to two different implementations of Algorithm N (see the text). On the left (resp. right) half of the figure, the 
minimum range of the uniform distributions shown is 2~ 7 (resp. 2~ s ). 



metic dFousse et al. , 2007). Column A uses the ratio method, 
with the optimizations given bv iLeval (1 1992b . and ordinary 
floating point arithmetic. Columns B and C use Algorithm 
N with the results rounded to IEEE floating point numbers 
(column B) or to MPFR's mpfr_t (column C). Column 
D uses the routine, mpf r_grandom, from MPFR version 
3.1.0, for sampling normal deviates. The source of uniform 
rando m numbers for these methods is the SFMT 19937 algo- 
rithm (Saito and Matsumoto, 2008) included in RandomLib 
for columns A and B, and the mpz_urandomb routine in 
Gnu MP library for columns C and D. The tests were carried 
out on a Fedora Linux system running on an Intel 2.66 GHz 
CPU. As expected, the scaling of the time for Algorithm N in 
column C is offset linear, approximately (1 + 300p/2 20 ) /is; 
this makes Algorithm N among the least expensive operations 
for an arbitrary precision library. 

5. DISCUSSION 

I consider three criteria by which Algorithm N might 
be judged, the complexity of the algorithm, its speed, 
and its accuracy. The core of Algorithm N, the class 
unit_normal_dist, is about a page of code. It is roughly 
10 times longer than many classical methods of sampling 
normal deviates and abo ut 2-3 times longer t hat so me of 
the faster methods, e.g., iMarsaglia and Tsangl ([2000). On 
the other hand, it is shorter than converting one of the stan- 
dard methods to yield exactly rounded results, epitomized by 
mpf r_grandom which uses the polar method, because of 
the extra code needed to guarantee that an accurate result is 
returned. Of course, in some respects, Algorithm N can be 
regarded as the simplest of the available algorithms; because 
it uses only integer operations, you can compute accurate nor- 
mal deviates by tossing a coin and using just pencil and paper! 



TABLE 2 Times (in /is) for sampling from the normal distribution. 
p is the number of bits in the fraction of the rounded floating point 
samples. Columns A-D use different methods as explained in the 
text. 



p 


A 


B 


C 


D 


24 


0.034 


0.38 


0.96 


3.5 


32 






1.0 


3.7 


53 


0.041 


0.44 


1.0 


3.9 
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0.044 


0.45 


1.0 


4.2 


128 






1.0 


5.7 


256 






1.1 


8.5 


2 io 






1.3 


26 


2 12 






2.2 


150 








5.7 


1400 


2 ie 






20 


15000 


2 18 






79 


120000 


2 20 






300 


820000 



Algorithm N is an order of magnitude slower than con- 
ventional methods for sampling normal deviates (compare 
columns A and B in Tabled, which is hardly surprising given 
its low level representation of random deviates. However it is 
considerably faster than mpf r_grandom at exact sampling, 
particularly at high precision (compare columns C and D in 
Table |2). If the source of randomness is a slow hardware 
generator, then the timing comparison reduces to measuring 
the use o f random bits. By this metric, the best conventiona l 
methods dBox and Mullerl Il958t iMarsaglia and Tsangl l2000l) 
consume one uniform deviate for each normal deviate, typi- 
cally requiring 53 random bits for each result at double pre- 
cision. In contrast, Algorithm N (with 6 = 2) uses 83 bits 
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which would be 60% slower; however, Algorithm N delivers 
accurate bounds on the the deviate using just 30 or so bits and 
thus it's possible that in some applications Algorithm N will 
be faster. 

Algorithm N provides a novel way of exactly sampling nor- 
mal deviates. Several classical methods can be modified to 
achieve the same results by using arbitrary precision arith- 
metic; however we saw from column D of Table [2]that this can 
be very slow. One method for sampling normal deviates which 
can be made exact wi th similar perfo rmance to Algorithm N is 
given b y kahnl d 1 956l p. 41); see also lAbramowitz and Stegunl 
d 1964 §26.8.6.a(4)). The algorithm is: sample x and x' from 
the exponential distribution and, if (x — l) 2 < 2x', attach 
a random sign to x and return the result. Unfortunately, the 
comparison entails performing arithmetic on x which requires 
the use of arbitrary precision arithmetic. On the other hand, 
the exponential deviates may be sampled using Algorithm E 
and the comparison can be accomplished adding only a few 
digits (on average) to x and x 1 . This means that it shares 
with Algorithm N the ideal scaling of cost with precision. I 
have implemented this algorithm using MPFR and find that 
it is more bit efficient than Algorithm N; despite this, the 
constant term in the expression for the time per normal devi- 
ate is slightly larger than Algorithm N. (Incidentally, the first 
two steps of Algorithm N are essentially an integer version of 
Kahn's method.) 

Many Monte Carlo simulations make heavy use of normal 
deviates. However such simulations would have to be extraor- 
dinarily long to distinguish between a conventional double 
precision version of the ratio method, for example, and Algo- 
rithm N with the results rounded to double precision. Given 
that it is about 10 times slower, there is little reason to prefer 
Algorithm N in most such simulations. However in some spe- 
cialized lapplicjrtiojis 2 _e 1 g^^ 

quire d dGranboulan and Pornini l2007t iMicciancio and Regev , 
2009), and then Algorithm N offers a good solution (often in 
conjunction with a library such as MPFR). 

This paper presents an algorithm for exactly sampling nor- 
mal deviates. This means that the three major maximum en- 
tropy distributions, uniform (trivially), exponential (via Algo- 
rithm E), and normal (via Algorithm N), can be sampled ex- 
actly with perfect asymptotic scaling. This makes them ideal 
for libraries for arbitrary precision arithmetic, such as MPFR, 
because the results can be exactly rounded to a precision of 
p bits with a cost which is 0(p). However, it might often be 
advantageous to delay the extension and rounding of the ran- 
dom deviates (with the associated costs and loss of accuracy). 
The output of these algorithms gives the first few digits of the 
random deviate and a rule for generating additional digits. As 
such, it occupies 0(1) storage and is still exact. Furthermore, 
certain operations can be performed on such random deviates 
at O(l) cost, an example being the comparison between ex- 
ponential deviates in Kahn's algorithm; it would be of inter- 
est to find more operations on random deviates that can be 
cheaply and exactly carried out. The resulting "lazy evalua- 
tion" framework would, in principal, require less storage, be 
faster, and be exact. (It's worth reiterating here that the exact- 
ness of these algorithms depend on the availability of an ideal 



source of random digits.) 

Algorithm N relies on von Neumann's algorithm in the re- 
jection methods used to select the integer and fractional parts 
of the normal deviate. The new techniques introduced are 
breaking step N4 into k + 1 tests, to reduce of the argument 
of the exponential, and adding a second set of tests, in step 
B2(ii), to compute a more complex exponential probability. 
Presumably algorithms similar to Algorithm E and N can be 
found for other distributions although, as yet, there is no sys- 
tematic approach to finding such algorithms. Related work by 
Flaiol et et al. I (120 lib discusses several interesting methods for 
sampling discrete distributions and considers ways in which 
they can be combined. For example, the authors give an al- 
gorithm for a Berno ulli trial with prob ability 1/tt, based on 
a formula given by Ramanujan| (Il914l Eq. (28)), which re- 
quires just under 10 bits, on average. It's probable that some 
of their techniques will be useful in finding algorithms for 
sampling from continuous distributions; they might also lead 
to improvements to Algorithm N for normal deviates. 
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Appendix A: Sample implementation 

Here I give listings of exact_normal_dist . hpp 
which implements Algorithm N and of a test program 
exact_normal_test . cpp. Also shown is some sample 
output. 

The implementation is via a header file, which assumes 
a standard C++ environment. Lines 9-17 define a class 
rand_digit to produce random digits in a base b. Lines 
19-70 define a class rand_number with realizes a random 
deviate with an arbitrary number of digits. Finally, lines 72- 
133 define the class unit_normal_dist which samples 
from the normal distribution. In order to simplify the exposi- 
tion, rand_digit uses std: : rand which, typically, has 
a short period; in "production" use, this should be replaced 
by a generator with a much longer period, for exam ple, the 
Mersenne Twister (Matsumoto and Nishimura, 1998), which 
is included in the standard library for C++1 1, or by a source 
of truly random digits. Also, in order to avoid overflow in Al- 
gorithm C, the base b needs to be small enough that (2k + 2)6 
does not overflow; this limitation is easily removed. 

The test program computes and prints 50 normal deviates 
exactly rounded to 10 decimal digits. This is achieved by gen- 
erating 1 1 digits in the fraction and rounding the mid-point of 
the resulting range. 

NOTE: The following files can be obtained by visiting 
http://arxiv.Org/a/karney_c_l, selecting the "other" option for 
this paper, and clicking on "Download source." Compilation 
instructions are given in 00README.txt in the resulting 
archive. 



1. Header File, exact_normal_dist .hpp 



#if ! defined (EXACT_NORMAL_DIST_HPP) 
#define E XAC T_N0RMAL_D I S T_HP P 1 



#include <vector> 
5 #include <algorithm> 
#include <utility> 
#include <cstdlib> 



// for fraction in rand_number 

// for std: -.swap, std: :min, std: : max 

// for std: -.pair 

/ / for std: : rand 
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// a class to produce random digits in [0,b) 
10 template<int b> class rand_digit { 

static const int _div = RAND_MAX / b, _max = _div * b; 
public: 

rand_digit (unsigned s) // constructor takes a seed 

{ std: : srand (s) ; } 
15 int operator) ) () //a random digit 

{ int r; while ( (r = std::rand()) >= _max) ; return r / _div; } 

}; 

// an arbitrary precision random number 
20 template<int b> class rand_number { 

int _s, _n; // sign and integer part; N.B. _n >= 

std : : vector<int> _d; // the digits in the fraction 

public: 

rand_number ( ) : _s ( 1 ) , _n ( ) {} // construct as (0,1) 

25 void init() { _s = 1; _n = 0; _d. clear (); } // reset to (0,1) 
void swap (rand_number& t) { 
if (this == &t) return; 

std :: swap (_s , t._s); std :: swap (_n, t._n); _d . swap (t . _d) ; 

} 

30 rand_number& operator^ (const rand_number& t) // safe copy assignment 

{ rand_number r(t); swap(r); return *this; } 

int sign() const { return _s; } 

void negate () { _s = -_s; } 

int integer () const { return _n; } 
35 void set_integer (int n) { _n = n; } // require n >= 

int ndigitsO const { return _d. size () ; } 

// return digit in fraction generating it if necessary 

int digit (rand_digit<b>& D, int k) { 

for (int i = ndigitsO; i <= k; ++i) _d . push_back (D ( ) ) ; 
40 return _d[k] ; 

} 

// test *this < t 

bool less_than (rand_digit<b>& D, rand_number& t) { 
if (this == &t) return false; 
45 if (_s != t._s) return _s < t._s; 

if (_n != t._n) return (_s < 0) " (_n < t._n); 
for (int k = 0; ; ++k) { 

int x = digit (D, k) , y = t. digit (D, k) ; 
if (x != y) return (_s < 0) " (x < y) ; 

50 } 

} 

// test *this < 1/2 

bool less_than_half (rand_digit<b>& D) { 
if (_s < 0) return true; 
55 if (_n > 0) return false; 

for (int k = 0; ; ++k) { 
int d = digit (D, k) ; 
if (d < b / 2) return true; 
if (d >= (b + 1) 12) return false; 

60 } 

} 

// the uniform range represented by the number 
std: :pair<double, double> range ( ) const { 

double x = , d = 1 ; 
65 for (int k = _d.size(); k— ; ) { x = (_d[k] + x) / b; d /= b; } 

x += _n; 

return std: :pair<double, double> (_s > ? x : - (x + d) , 

_s > ? (x + d) : -x) ; 

} 

70 } ; 
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// sample exactly from the normal distribution 
template<int b> class unit_normal_dist { 
public: 

75 



rand_number<b> operator) ) ( rand_digit<b>& D) const { 
for (;;) { 





int k = G (D) ; 


// 


step 


1 




if (!P(D, k * (k - 1))) continue; 


// 


step 


2 


80 


_x . init ( ) ; 


// 


step 


3 




int j = k + 1; while (j — && B(D, k, 


_x) ) ; // 


step 


4 




if (! (j < 0)) continue; 










_x . set_integer (k) ; 


// 


step 


5 




if (_p.init(), _p . les s_than_hal f ( D ) ) 


_x . negate ( ) ; // 


step 


6 


85 


return _x; 


// 


step 


7 



} 

} 

private: 

90 mutable rand_number<b> _p, _q, _x; // temporary storage 

// Algorithm H: true with probability exp (-1/2) . 
bool H ( rand_digit<b>& D) const { 

if (_p.init(), ! _p . les s_than_hal f (D ) ) return true; 
95 for (;;) { 

if (_q.init(), !_q. less_than (D, _p) ) return false; 
if (_p.init(), !_p. less_than (D, _q) ) return true; 

} 

} 

100 

// Step Nl: return n >= with prob. exp(-n/2) * (1 - exp (-1/2) ) . 

int G (rand_digit<b>& D) const 

{ int n = 0; while (H (D) ) ++n; return n; } 

105 // Step N2 : true with probability exp (-n/2 ) . 
bool P (rand_digit<b>& D, int n) const 
{ while (n — && H ( D ) ) ; return n < 0; } 

// Algorithm C: return (-1, 0, 1) with prob (1/m, 1/m, 1-2/m) . 
110 static int C ( rand_digit<b>& D, int m) { 
int nl = m - 2, n2 = m - 1; 
for (;;) { 
int d = D ( ) ; 

if ( (nl = std: :max(0, nl * b - d * m) ) >= m) return 1; 
115 if ( (n2 = std::min(m, n2 * b - d * m) ) <= 0) return -1; 

if (nl <= && n2 >= m) return 0; 

} 

} 

120 // Algorithm B: true with prob exp (-x * (2*k + x) / (2*k +2)). 
bool B ( rand_digit<b>& D, int k, rand_number<b>& x) const { 
int n = 0, m = 2*k + 2, f; 
for (;; ++n) { 

if ( ( (f = k ? : C (D, m) ) < 0) I 
125 (_q.init(), !_q . less_than (D, n ? _p : x) ) | 

( (f = k ? C(D, m) : f) < 0) | | 

(f == && (_p.init(), ! _p . les s_than (D , x) ) ) ) break; 
_p.swap(_q); // an efficient way of doing p = q 

} 

130 return (n % 2) ==0; 

} 

}; 



135 #endif 
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2. Test Program, exactjiorraal.test . cpp 

#include <ctime> // for std: :time 

finclude <iostream> // for std::cout 

finclude <iomanip> // for std: : fixed, std: : setprecision 

finclude "exact_normal_dist . hpp" 

5 

int main ( ) { 

const int b = 10, p = 10, n = 50; 
unsigned s = std: : time ( ) ; 

rand_digit<b> D(s); rand_number<b> x; unit_normal_dist<b> N; 
10 std: : cout << n << " normal deviates rounded to " 

<< p << " decimal digits (seed = " << s << ")\n\n" 
<< std:: fixed << std: : setprecision (p) << std: : showpos; 
for (int i = 1; i <= n; ++i) { 

x = N(D); // generate normal deviate 

15 x. digit (D, p) ; // generate p+1 digits in the fraction 

std: :pair<double, double> r = x . range () ; 

std::cout << (r. first + r. second) / 2 << (i % 5 ? ' ' : ' \n' ) ; 



} 

return 0; 



20 



3. Sample Output 

50 normal deviates rounded to 10 decimal digits (seed = 1363577353) 

+0.5900895626 -2.84 75038 632 -1.2 625 9577 42 +0.04 9072527 5 -0.7481913286 

+0.3615463565 +0.24 7 6988 69 6 +1.54 32 6 990 93 +0.6572 955312 +2.0787863832 

-1.0229481519 -0.4487782598 +1.0177 72 9117 -1.88 642 5502 3 -0.3178039375 

-0.5808291519 -0.92 02 65 6 614 -0.2466482059 -0.8982221741 +0.2979553171 

+1.1007225649 -0.0181433259 +0.1360926971 +0.3408369630 -0.6025914923 

-2.3302632498 +0.258 68 01142 +0.55584 64123 +1.0995017431 -0.7 9123 6354 8 

-0.9253888913 +0.5205200343 +0.1950752 917 -0.4 54 97 50103 +1.5675432394 

+0.6598556315 +0.3984283840 +1.1167816249 -0.1015855019 +1.3171410458 

-0.3099378684 +0.2188000064 -1.3685027317 -0.4542858594 -0.0287872347 

-0.5041824162 -0.05434 69225 -0.8011012065 -0.527 4 08 63 69 +0.2397280757 



